Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 382154 |
| Missing cells | 719119 |
| Missing cells (%) | 17.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 32.1 MiB |
| Average record size in memory | 88.0 B |
Variable types
| NUM | 6 |
|---|---|
| CAT | 3 |
| BOOL | 2 |
Tanggal_Asuransi has a high cardinality: 848 distinct values | High cardinality |
Gender has 31768 (8.3%) missing values | Missing |
Umur has 96258 (25.2%) missing values | Missing |
Izin_Mengemudi has 76647 (20.1%) missing values | Missing |
Kode_Wilayah has 84074 (22.0%) missing values | Missing |
Tanggal_Asuransi has 78084 (20.4%) missing values | Missing |
Tahun_Kendaraan has 66440 (17.4%) missing values | Missing |
Biaya has 126537 (33.1%) missing values | Missing |
Sourcing_Channel has 83645 (21.9%) missing values | Missing |
Hari_Diasuransikan has 75666 (19.8%) missing values | Missing |
id has unique values | Unique |
Reproduction
| Analysis started | 2021-03-21 09:27:03.984535 |
|---|---|
| Analysis finished | 2021-03-21 09:27:43.899546 |
| Duration | 39.92 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 382154 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 234392.9535 |
|---|---|
| Minimum | 1 |
| Maximum | 508145 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 22939.65 |
| Q1 | 115006.25 |
| median | 230461.5 |
| Q3 | 345434.75 |
| 95-th percentile | 471209.05 |
| Maximum | 508145 |
| Range | 508144 |
| Interquartile range (IQR) | 230428.5 |
Descriptive statistics
| Standard deviation | 139527.4873 |
|---|---|
| Coefficient of variation (CV) | 0.5952716806 |
| Kurtosis | -1.058717575 |
| Mean | 234392.9535 |
| Median Absolute Deviation (MAD) | 115213 |
| Skewness | 0.1322865442 |
| Sum | 8.957420474e+10 |
| Variance | 1.946791972e+10 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 4094 | 1 | < 0.1% | |
| 432518 | 1 | < 0.1% | |
| 90489 | 1 | < 0.1% | |
| 96634 | 1 | < 0.1% | |
| 94587 | 1 | < 0.1% | |
| 84348 | 1 | < 0.1% | |
| 82301 | 1 | < 0.1% | |
| 88446 | 1 | < 0.1% | |
| 86399 | 1 | < 0.1% | |
| 434561 | 1 | < 0.1% | |
| Other values (382144) | 382144 | > 99.9% |
| Value | Count | Frequency (%) | |
| 1 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% | |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 508145 | 1 | < 0.1% | |
| 508144 | 1 | < 0.1% | |
| 508143 | 1 | < 0.1% | |
| 508141 | 1 | < 0.1% | |
| 508140 | 1 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 31768 |
| Missing (%) | 8.3% |
| Memory size | 2.9 MiB |
| Pria | |
|---|---|
| Wanita |
| Value | Count | Frequency (%) | |
| Pria | 192814 | 50.5% | |
| Wanita | 157572 | 41.2% | |
| (Missing) | 31768 | 8.3% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.741523051 |
| Min length | 3 |
| Distinct | 66 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 96258 |
| Missing (%) | 25.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.91659205 |
|---|---|
| Minimum | 20 |
| Maximum | 85 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 20 |
|---|---|
| 5-th percentile | 21 |
| Q1 | 24 |
| median | 33 |
| Q3 | 52 |
| 95-th percentile | 71 |
| Maximum | 85 |
| Range | 65 |
| Interquartile range (IQR) | 28 |
Descriptive statistics
| Standard deviation | 16.70679967 |
|---|---|
| Coefficient of variation (CV) | 0.4292976026 |
| Kurtosis | -0.8350756175 |
| Mean | 38.91659205 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.6477650251 |
| Sum | 11126098 |
| Variance | 279.1171551 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 24 | 23033 | 6.0% | |
| 23 | 21560 | 5.6% | |
| 25 | 18636 | 4.9% | |
| 22 | 18367 | 4.8% | |
| 21 | 13996 | 3.7% | |
| 26 | 9776 | 2.6% | |
| 27 | 7748 | 2.0% | |
| 28 | 6449 | 1.7% | |
| 50 | 6316 | 1.7% | |
| 51 | 5789 | 1.5% | |
| Other values (56) | 154226 | 40.4% | |
| (Missing) | 96258 | 25.2% |
| Value | Count | Frequency (%) | |
| 20 | 5735 | 1.5% | |
| 21 | 13996 | 3.7% | |
| 22 | 18367 | 4.8% | |
| 23 | 21560 | 5.6% | |
| 24 | 23033 | 6.0% |
| Value | Count | Frequency (%) | |
| 85 | 10 | < 0.1% | |
| 84 | 14 | < 0.1% | |
| 83 | 24 | < 0.1% | |
| 82 | 32 | < 0.1% | |
| 81 | 52 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 76647 |
| Missing (%) | 20.1% |
| Memory size | 2.9 MiB |
| 1 | |
|---|---|
| 0 | 362 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 1 | 305145 | 79.8% | |
| 0 | 362 | 0.1% | |
| (Missing) | 76647 | 20.1% |
| Distinct | 53 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 84074 |
| Missing (%) | 22.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 26.40603194 |
|---|---|
| Minimum | 0 |
| Maximum | 52 |
| Zeros | 1426 |
| Zeros (%) | 0.4% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 15 |
| median | 28 |
| Q3 | 35 |
| 95-th percentile | 47 |
| Maximum | 52 |
| Range | 52 |
| Interquartile range (IQR) | 20 |
Descriptive statistics
| Standard deviation | 13.16317898 |
|---|---|
| Coefficient of variation (CV) | 0.4984913678 |
| Kurtosis | -0.8581445102 |
| Mean | 26.40603194 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | -0.1174582097 |
| Sum | 7871110 |
| Variance | 173.2692808 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 28 | 83857 | 21.9% | |
| 8 | 26475 | 6.9% | |
| 46 | 15718 | 4.1% | |
| 41 | 14847 | 3.9% | |
| 15 | 10164 | 2.7% | |
| 30 | 9934 | 2.6% | |
| 29 | 9110 | 2.4% | |
| 50 | 7911 | 2.1% | |
| 11 | 7302 | 1.9% | |
| 3 | 7271 | 1.9% | |
| Other values (43) | 105491 | 27.6% | |
| (Missing) | 84074 | 22.0% |
| Value | Count | Frequency (%) | |
| 0 | 1426 | 0.4% | |
| 1 | 724 | 0.2% | |
| 2 | 2868 | 0.8% | |
| 3 | 7271 | 1.9% | |
| 4 | 1398 | 0.4% |
| Value | Count | Frequency (%) | |
| 52 | 210 | 0.1% | |
| 51 | 154 | < 0.1% | |
| 50 | 7911 | 2.1% | |
| 49 | 1342 | 0.4% | |
| 48 | 3316 | 0.9% |
| Distinct | 848 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 78084 |
| Missing (%) | 20.4% |
| Memory size | 2.9 MiB |
| 7/29/2019 | 468 |
|---|---|
| 10/4/2019 | 467 |
| 2/7/2020 | 454 |
| 2/18/2020 | 454 |
| 2/6/2020 | 453 |
| Other values (843) |
| Value | Count | Frequency (%) | |
| 7/29/2019 | 468 | 0.1% | |
| 10/4/2019 | 467 | 0.1% | |
| 2/7/2020 | 454 | 0.1% | |
| 2/18/2020 | 454 | 0.1% | |
| 2/6/2020 | 453 | 0.1% | |
| 2/23/2020 | 452 | 0.1% | |
| 1/3/2020 | 452 | 0.1% | |
| 8/1/2019 | 448 | 0.1% | |
| 8/23/2019 | 447 | 0.1% | |
| 9/6/2019 | 447 | 0.1% | |
| Other values (838) | 299528 | 78.4% | |
| (Missing) | 78084 | 20.4% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 7.716483407 |
| Min length | 3 |
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 66440 |
| Missing (%) | 17.4% |
| Memory size | 2.9 MiB |
| 1-2 Tahun | |
|---|---|
| <1 Tahun | |
| >2 Tahun |
| Value | Count | Frequency (%) | |
| 1-2 Tahun | 150132 | 39.3% | |
| <1 Tahun | 149957 | 39.2% | |
| >2 Tahun | 15625 | 4.1% | |
| (Missing) | 66440 | 17.4% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 7.523574266 |
| Min length | 3 |
| Distinct | 46453 |
|---|---|
| Distinct (%) | 18.2% |
| Missing | 126537 |
| Missing (%) | 33.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31183.75678 |
|---|---|
| Minimum | 2630 |
| Maximum | 540165 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 2630 |
|---|---|
| 5-th percentile | 2630 |
| Q1 | 24426 |
| median | 31887 |
| Q3 | 40007 |
| 95-th percentile | 59165.2 |
| Maximum | 540165 |
| Range | 537535 |
| Interquartile range (IQR) | 15581 |
Descriptive statistics
| Standard deviation | 18392.30559 |
|---|---|
| Coefficient of variation (CV) | 0.5898040354 |
| Kurtosis | 36.42634577 |
| Mean | 31183.75678 |
| Median Absolute Deviation (MAD) | 7791 |
| Skewness | 2.148623328 |
| Sum | 7971098357 |
| Variance | 338276904.8 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2630 | 44088 | 11.5% | |
| 69856 | 133 | < 0.1% | |
| 45179 | 38 | < 0.1% | |
| 38452 | 35 | < 0.1% | |
| 70720 | 33 | < 0.1% | |
| 72544 | 29 | < 0.1% | |
| 38287 | 28 | < 0.1% | |
| 31102 | 25 | < 0.1% | |
| 36086 | 24 | < 0.1% | |
| 34885 | 24 | < 0.1% | |
| Other values (46443) | 211160 | 55.3% | |
| (Missing) | 126537 | 33.1% |
| Value | Count | Frequency (%) | |
| 2630 | 44088 | 11.5% | |
| 6466 | 1 | < 0.1% | |
| 9816 | 1 | < 0.1% | |
| 10004 | 1 | < 0.1% | |
| 10148 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 540165 | 4 | < 0.1% | |
| 508073 | 1 | < 0.1% | |
| 495106 | 1 | < 0.1% | |
| 472042 | 4 | < 0.1% | |
| 448156 | 1 | < 0.1% |
| Distinct | 153 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 83645 |
| Missing (%) | 21.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 110.8720072 |
|---|---|
| Minimum | 1 |
| Maximum | 163 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 26 |
| Q1 | 26 |
| median | 152 |
| Q3 | 152 |
| 95-th percentile | 160 |
| Maximum | 163 |
| Range | 162 |
| Interquartile range (IQR) | 126 |
Descriptive statistics
| Standard deviation | 57.8626207 |
|---|---|
| Coefficient of variation (CV) | 0.5218866526 |
| Kurtosis | -1.282168011 |
| Mean | 110.8720072 |
| Median Absolute Deviation (MAD) | 8 |
| Skewness | -0.7751767326 |
| Sum | 33096292 |
| Variance | 3348.082875 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 152 | 120260 | 31.5% | |
| 26 | 70813 | 18.5% | |
| 124 | 26238 | 6.9% | |
| 160 | 21045 | 5.5% | |
| 156 | 10106 | 2.6% | |
| 157 | 6739 | 1.8% | |
| 154 | 5883 | 1.5% | |
| 122 | 4952 | 1.3% | |
| 151 | 3279 | 0.9% | |
| 163 | 2972 | 0.8% | |
| Other values (143) | 26222 | 6.9% | |
| (Missing) | 83645 | 21.9% |
| Value | Count | Frequency (%) | |
| 1 | 948 | 0.2% | |
| 2 | 4 | < 0.1% | |
| 3 | 505 | 0.1% | |
| 4 | 515 | 0.1% | |
| 6 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 163 | 2972 | 0.8% | |
| 160 | 21045 | 5.5% | |
| 159 | 52 | < 0.1% | |
| 158 | 492 | 0.1% | |
| 157 | 6739 | 1.8% |
| Distinct | 290 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 75666 |
| Missing (%) | 19.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 154.1689952 |
|---|---|
| Minimum | 10 |
| Maximum | 299 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 2.9 MiB |
Quantile statistics
| Minimum | 10 |
|---|---|
| 5-th percentile | 24 |
| Q1 | 81 |
| median | 154 |
| Q3 | 227 |
| 95-th percentile | 285 |
| Maximum | 299 |
| Range | 289 |
| Interquartile range (IQR) | 146 |
Descriptive statistics
| Standard deviation | 83.72084959 |
|---|---|
| Coefficient of variation (CV) | 0.5430459574 |
| Kurtosis | -1.201810651 |
| Mean | 154.1689952 |
| Median Absolute Deviation (MAD) | 73 |
| Skewness | 0.004187327762 |
| Sum | 47250947 |
| Variance | 7009.180657 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 256 | 1167 | 0.3% | |
| 54 | 1146 | 0.3% | |
| 73 | 1145 | 0.3% | |
| 56 | 1132 | 0.3% | |
| 63 | 1131 | 0.3% | |
| 31 | 1131 | 0.3% | |
| 22 | 1127 | 0.3% | |
| 160 | 1126 | 0.3% | |
| 37 | 1124 | 0.3% | |
| 13 | 1123 | 0.3% | |
| Other values (280) | 295136 | 77.2% | |
| (Missing) | 75666 | 19.8% |
| Value | Count | Frequency (%) | |
| 10 | 1104 | 0.3% | |
| 11 | 1084 | 0.3% | |
| 12 | 1008 | 0.3% | |
| 13 | 1123 | 0.3% | |
| 14 | 995 | 0.3% |
| Value | Count | Frequency (%) | |
| 299 | 1026 | 0.3% | |
| 298 | 1082 | 0.3% | |
| 297 | 1020 | 0.3% | |
| 296 | 1071 | 0.3% | |
| 295 | 1041 | 0.3% |
Target
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.9 MiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 319553 | 83.6% | |
| 1 | 62601 | 16.4% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| id | Gender | Umur | Izin_Mengemudi | Kode_Wilayah | Tanggal_Asuransi | Tahun_Kendaraan | Biaya | Sourcing_Channel | Hari_Diasuransikan | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 58609 | Pria | 65.0 | 1.0 | 48.0 | 11/4/2018 | NaN | 2630.0 | 15.0 | 131.0 | 0 |
| 1 | 208222 | Wanita | 22.0 | 1.0 | 21.0 | 2/2/2018 | <1 Tahun | NaN | NaN | NaN | 0 |
| 2 | 345428 | Wanita | 24.0 | 1.0 | NaN | 5/12/2019 | <1 Tahun | NaN | NaN | 181.0 | 0 |
| 3 | 236831 | Pria | 58.0 | 1.0 | 46.0 | NaN | 1-2 Tahun | NaN | 124.0 | NaN | 0 |
| 4 | 280181 | Pria | NaN | 1.0 | 36.0 | 11/19/2019 | >2 Tahun | NaN | NaN | NaN | 1 |
| 5 | 31680 | Pria | 55.0 | 1.0 | 28.0 | 12/10/2019 | 1-2 Tahun | 54135.0 | 52.0 | 285.0 | 0 |
| 6 | 52488 | NaN | 23.0 | 1.0 | 50.0 | NaN | <1 Tahun | NaN | NaN | 145.0 | 0 |
| 7 | 278334 | Pria | 72.0 | 1.0 | 28.0 | 8/26/2019 | >2 Tahun | NaN | 122.0 | 242.0 | 0 |
| 8 | 129322 | Pria | 23.0 | NaN | NaN | 8/16/2019 | <1 Tahun | 33007.0 | 124.0 | NaN | 0 |
| 9 | 316145 | Pria | NaN | 1.0 | NaN | NaN | 1-2 Tahun | 53322.0 | NaN | NaN | 0 |
Last rows
| id | Gender | Umur | Izin_Mengemudi | Kode_Wilayah | Tanggal_Asuransi | Tahun_Kendaraan | Biaya | Sourcing_Channel | Hari_Diasuransikan | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 382144 | 478163 | Pria | 55.0 | 1.0 | NaN | 3/30/2019 | NaN | NaN | NaN | 152.0 | 0 |
| 382145 | 253277 | Pria | 49.0 | 1.0 | 28.0 | 8/11/2018 | 1-2 Tahun | NaN | 30.0 | 84.0 | 0 |
| 382146 | 93297 | Pria | 51.0 | 1.0 | 33.0 | 6/17/2018 | NaN | NaN | NaN | NaN | 0 |
| 382147 | 335745 | Pria | 21.0 | 1.0 | 47.0 | NaN | <1 Tahun | NaN | 160.0 | NaN | 0 |
| 382148 | 178224 | Wanita | NaN | 1.0 | 28.0 | 12/13/2019 | NaN | 22158.0 | NaN | 40.0 | 0 |
| 382149 | 255964 | Pria | 52.0 | 1.0 | 28.0 | NaN | >2 Tahun | NaN | NaN | 217.0 | 1 |
| 382150 | 102144 | Pria | 23.0 | 1.0 | NaN | 8/27/2018 | <1 Tahun | 29282.0 | 152.0 | 260.0 | 0 |
| 382151 | 480784 | Pria | NaN | 1.0 | 3.0 | 9/12/2019 | NaN | 29217.0 | NaN | NaN | 1 |
| 382152 | 321214 | NaN | 51.0 | 1.0 | NaN | 8/8/2019 | NaN | 42063.0 | 26.0 | 148.0 | 0 |
| 382153 | 372274 | Pria | 57.0 | 1.0 | NaN | 10/24/2019 | 1-2 Tahun | NaN | 26.0 | 215.0 | 0 |